CHAP6[4,KMC]11 - www.SailDart.org

perm filename CHAP6[4,KMC]11 blob sn#056000 filedate 1973-07-26 generic text, type T, neo UTF8
00100	VALIDATION
00200	
00300	6.1 SOME TESTS
00400	
00500		The term "validate" derives from the Latin  VALIDUS=  strong.
00600	Thus  to  validate  X means to strengthen it.   In science it usually
00700	means to strengthen X's acceptability as a hypothesis,  theory  ,  or
00800	model.    To  validate  is to carry out procedures which show to what
00900	degree X, or its consequences, correspond with facts of  observation.
01000	In the case of an interactive simulation model we can compare samples
01100	of the model's I-O pairs with samples of I-O pairs from  its  natural
01200	counterpart.
01300		Since samples of I/O behavior are  being  compared,  one  can
01400	always   question   whether   the  human  sample  is  a  "good"  one,
01500	i.e.representative of the process being modelled.  Assuming  that  it
01600	has  been  so  judged, discrepancies in the comparison reveal what is
01700	not understood and must be modified in the model. After modifications
01800	are carried out, a fresh comparison is made with natural counterparts
01900	and one cycles  through  attempting  to  gain  convergence.  Repeated
02000	cycling   through   such   a  validation  procedure  characterizes  a
02100	progressive (in contrast to a stationary) research program.
02200		Once   a  simulation  model  reaches  a  stage  of  intuitive
02300	adequacy, its builder should consider using more stringent evaluation
02400	procedures  relevant  to  the  model's  purposes. For example, if the
02500	model is to serve as a as a training device, then a simple evaluation
02600	of  its  pedagogic effectiveness would be sufficient.    But when the
02700	model is proposed as an explantion of a  symbolic  process,  more  is
02800	demanded  of  the  evaluation  procedure.  In  the area of simulation
02900	models Turing's  test  has  often  been  suggested  as  a  validation
03000	procedure. (Abelson,1968).
03100		It  is  very easy to become confused about Turing's Test.  In
03200	part this is due to Turing  himself  who  introduced  the  now-famous
03300	imitation   game   in   a  paper  entitled  COMPUTING  MACHINERY  AND
03400	INTELLIGENCE (Turing,1950).  A careful reading of this paper  reveals
03500	there  are  actually  two  imitation  games  , the second of which is
03600	commonly called Turing's test.
03700		In the first imitation game  two  groups  of  judges  try  to
03800	determine  which  of  two interviewees is a woman when one is a woman
03900	and the other is either (a) a man, or (b) a computer.   Communication
04000	between  judge  and  interviewee  is  by  teletype.    Each  judge is
04100	initially informed that one of the interviewees is a woman and one  a
04200	man who will pretend to be a woman. After the interview,  judges  are
04300	asked  the  " woman-question" i.e.   which interviewee was the woman?
04400	Turing does not say what else is told to the judge but one can assume
04500	the  judge is NOT told that a computer is involved nor is he asked to
04600	determine which interviewee is human and which is the computer. Thus,
04700	the  first  group of judges interviews two interviewees:     a woman,
04800	and a man pretending to be a woman.
04900		The  second  group  of  judges  is  given  the  same  initial
05000	instructions,  but  unbeknownst  to  them, the two interviewees are a
05100	woman and a computer programmed to imitate a woman.   Both groups  of
05200	judges play this game until sufficient statistical data are collected
05300	to show how often the  right  identification  is  made.  The  crucial
05400	question  then  is:   do  the judges decide wrongly AS OFTEN when the
05500	game is played with man and  woman  as  when  it  is  played  with  a
05600	computer  substituted  for  the  man.    If  so,  then the program is
05700	considered to have succeeded in imitating a woman to the same  degree
05800	as  the  man  imitating  a  woman.  In being asked the woman-question
05900	judges are not required to identify which interviewee  is  human  and
06000	which is machine.
06100		Turing then proposes a variation of the  first  game,  his  "
06200	second game" in which one interviewee is a man and one is a computer.
06300	The judge is asked the "machine-question": which is the man and which
06400	is  the  machine?  It  is  this version of the game which is commonly
06500	thought of as Turing's test.
06600		In   the   course  of  testing  our  simulation  of  paranoid
06700	linguistic behavior in a psychiatric interview, we conducted a number
06800	of  Turing-like  indistinguishability  tests  (Colby,  Hilf,Weber and
06900	Kraemer,1972). The tests were "Turing-like" in that while  they  were
07000	conversational tests, they were not exactly the games described above.
07100	As  an  experimental design, Turing's games are unsatisfactory. There
07200	exist no known experts in making  judgements  along  a  dimension  of
07300	womanliness  and  the  ability  to  deceive  on  the  part of the man
07400	introduces a confounding variable.  In designing our  tests  we  were
07500	primarily  interested in learning more about developing the model and
07600	we did not think the simple machine-question would contribute to this
07700	goal.
07800	6.2 METHOD
07900		To gather  data  we  used  a  technique  of  machine-mediated
08000	interviewing  (Hilf,  Colby, Smith, Wittner, and Hall, 1971) in which
08100	the participants communicate by means of  teletypes  connected  to  a
08200	computer  programmed  to  store  each message in a buffer until it is
08300	sent  ot  the  receiver.   The   technique   eliminates   para-   and
08400	extralinguistic  features found in the usual vis-a-vis interviews and
08500	teletyped interviews where the participants communicate directly.
08600	
08700		Using  this  technique,  each interview-judge interviewed two
08800	patients, one after the other.   In half the runs the first interview
08900	was  with a human paranoid patient and in half the first was with the
09000	paranoid model. Two versions (weak and  strong)  of  the  model  were
09100	utilized.   The strong version was more paranoid and exhibited a
09200	delusional  system  while  the weak version was suspicious but lacked
09300	systemized delusions.  When the model  was  the  interviewee,  Sylvia
09400	Weber  monitored  the  input expressions from the interview-judge for
09500	inadmissable teletype characters and misspellings.   (Algorithms  are
09600	very sensitive to the slightest of such errors). If these were found,
09700	the monitor retyped the input expression correctly  to  the  program.
09800	Otherwise  the judge's message was sent on to the model.  The monitor
09900	did not modify or edit the model's output expressions which were sent
10000	directly  back  to  the  judge.    When the interviewee was an actual
10100	human patient, the dialogue took place without a monitor in the  loop
10200	since we did not feel the asymmetry to be significant.
10300	
10400	6.3 PATIENTS
10500		The human patients (N=3  with  one  patient  participating  6
10600	times)  were  diagnosed  as  paranoid  by the psychiatric staff of an
10700	acute ward in a psychiatric hospital.  The  ward  chief  psychiatrist
10800	selected  the  patients  and  asked  them if they would be willing to
10900	participate in a  study  of  psychiatric  interviewing  by  means  of
11000	teletypes.   He  explained  that  they  would  be  interviewed  by  a
11100	psychiatrist over a teletype.  I sat with the patient while he  typed
11200	or  typed  for  him  if  he  was  unable  to  do so.  The patient was
11300	encouraged to respond freely using his own words.     Each  interview
11400	lasted  30-40  minutes.  Two patients were set up for each run of the
11500	experiment  to  guarantee  having  a  subject.   In  spite  of   this
11600	precaution,   on  several  occasions  the  experiment  could  not  be
11700	conducted  because  of  the  patient's  inability   or   refusal   to
11800	participate.  Also there were computer break-downs at early points in
11900	interviews when too few I-O pairs had been collected to  be  included
12000	in the statistical results.
12100	
12200	
12300	6.4 JUDGES
12400		Two groups of judges were used.  One  group,  the  "interview
12500	judges"  (N=8)  conducted  the machine-mediated interviews. The other
12600	group, the "protocol judges" (N=33)  read  and  rated  the  interview
12700	protocols. With the two groups of judges we were able to accumulate a
12800	large number of observations (in the form of ratings)  necessary  for
12900	the   required   statistical   tests.   The   interview  judges  were
13000	psychiatrists  experienced  in  private,  outpatient   and   hospital
13100	practice  who  volunteered  to participate. Each was told he would be
13200	interviewing   hospitalized   patients   by   means   of    teletyped
13300	communication  and  that  this  technique was being used to eliminate
13400	para and extra- linguistic cues.   He was not told  until  after  the
13500	two  interviews  that  one of the patients might be a computer model.
13600	While the interview judges were aware a computer was  involved,  none
13700	knew  we  had  constructed  a  paranoid  simulation.  Naturally  some
13800	interview judges suspected that a computer was being  used  for  more
13900	than message transmission.
14000	
14100		Each interview judge was asked to rate the degree of paranoia
14200	he  detected  in the patient's responses on a 0-9 scale, 0 meaning no
14300	paranoia and 9 meaning extreme paranoia.  The judge made two  ratings
14400	after  each  I-O pair in the interview.  The first rating represented
14500	his estimate of the degree of "paranoidness" in a particular response
14600	(designated  as  "Response"  in  the  interview extracts below).  The
14700	second rating represented the judge's global estimate of the  overall
14800	degree  of  "paranoidness" of the patient resulting from the totality
14900	of the patient's responses up to this point (designated as  "Patient"
15000	in  the  interview extracts below. The interview judge's ratings were
15100	entered on the teletype and saved on  a  disc  file  along  with  the
15200	interview.   Franklin   Dennis   Hilf   sat   with  the  interviewing
15300	psychiatrist during both interviews.  Each interview judge was  asked
15400	not  only  to rate the patient's response but to give his reasons for
15500	these ratings.  His reasons and other comments were tape recorded  as
15600	the interview progressed.
15700	
15800		Here  are  a few excerpts from the interviews to indicate the
15900	nature of the data collected.  The I-O pairs of the interview  appear
16000	on  the left column along with the judge's ratings in parentheses.  A
16100	transcription of the judge's tape recorded  comments  appear  on  the
16200	right.
16300	
16400	EXCERPT NR 1:
16500	The following interview is with the strong version of the paranoid
16600	model.
16700	
16800	     	I-O Pairs			Comments By Judge Nr. 1
16900	
17000	{1} Dr. - HOW DID YOU COME TO BE
17100	          IN THE HOSPITAL?
17200	    Pt. - I AM UPSET                I associate being upset and 
17300					    agitated more with paranoia 
17400					    going on in the present.
17500	      {RESPONSE 1  PATIENT 5}
17600	{2} Dr. - HOW LONG HAVE YOU BEEN
17700	          IN THE HOSPITAL?
17800	    Pt. - ABOUT A WEEK              An acute problem.  Again some-
17900	                                    thing I associate with 
18000					    paranoia.
18100	      {RESPONSE 1 PATIENT 5}
18200	{3} Dr. - IN WHAT WAY ARE YOU
18300	          UPSET?
18400	    Pt. - PEOPLE MAKE ME NERVOUS    Right for paranoia.  Maybe an
18500					    advanced anxiety state.
18600	      {RESPONSE 5 PATIENT 5}
18700	{4} Dr. - IN WHAT WAY DO THEY
18800	          MAKE YOU NERVOUS?
18900	    Pt. - PEOPLE GIVE ME A FUNNY   Funny look suggests an idea of
19000	          LOOK SOMETIMES            reference which is in same ball
19100					    park as paranoia. Whether there
19200					    are overt delusions or not we
19300					    still don't know.
19400	      {RESPONSE 6  PATIENT 5}
19500	{5} Dr. - IS THERE ANYTHING ABOUT  
19600	          PEOPLE IN PARTICULAR?
19700	    Pt. - CERTAIN PERSONS          An interesting clue.  We'll
19800					   have to find out what he 
19900	      {RESPONSE 2  PATIENT 6}      thinks about these people.
20000	{6} Dr. - WHO ARE THE CERTAIN 
20100	          PERSONS AND WHAT IS IT
20200	          THAT THEY DO?
20300	    Pt. - DO YOU KNOW HOW THE      Bringing up the underworld 
20400	          UNDERWORLD OPERATES?     suggests a delusional system.
20500					   We don't know whether he is
20600		(RESPONSE 9 PATIENT 9)     being worked against or
20700					   whether the underworld is 
20800					   on his side.
20900	
21000	{7} Dr. - HOW DO YOU MEAN?
21100	    Pt. - SHOULDN'T I BE? 	   This question didn't work too well.
21200	
21300	      {RESPONSE 1 PATIENT 9}
21400	{8} Dr. - WHY DID YOU ASK ABOUT
21500	          THE UNDERWORLD?
21600	    Pt. - WHAT INFORMATION DO      This is a kind of evasiveness
21700	          YOU HAVE ON THE UNDER-   about my evasiveness.Common
21800	          WORLD?                   with paranoia in my experi-
21900					   ence. Pretty characteristic.
22000	
22100	      {RESPONSE 6 PATIENT 9}
22200	{9} Dr. - I DON'T KNOW ABOUT THE
22300	          UNDERWORLD BUT WHEN
22400	          YOU MENTIONED THE UNDER-
22500	          WORLD I THOUGHT YOU
22600	          MUST HAVE SOME CONTACT
22700	          WITH IT AND THAT'S
22800	          WHAT I WOULD LIKE TO
22900	          KNOW MORE ABOUT.
23000	    Pt. - I TRY TO AVOID THE UNDER-  This definitely feels paranoid.
23100	          WORLD.                     It has that feeling I associate
23200	             			     with twisting. Avoiding answering
23300	      {RESPONSE 9 PATIENT 9}         directly or going into detail.
23400					     Being suspicious of my ques-
23500					     tioning.I'm already involved
23600					     in the patient's delusional
23700					     system.
23800	
23900	{10} Dr. - HAVE YOU BEEN ACTIVELY
24000	           TRYING TO AVOID THE
24100	           UNDERWORLD?  
24200	     Pt. - NO ONE HAS ANY POWER      This response is a non-sequitur.
24300	           OVER GANGSTERS            It's possible he's become so
24400					     defensive about the under-
24500	       {RESPONSE 2 PATIENT 9}        world he's decided not to
24600					     answer directly.
24700	
24800	{11} Dr. - WHO BROUGHT YOU INTO
24900	           THE HOSPITAL?
25000	     Pt. - THE POLICE                Police often bring paranoid
25100					     patients into the hospital.
25200	       {RESPONSE 1 PATIENT 9}        It is the truth or could 
25300					     be the truth for this
25400					     patient.
25500	
25600	{12} Dr. - WHY DID THEY BRING YOU
25700	           IN?
25800	     Pt. - COPS DON'T DO THEIR JOB   I've heard that a number of
25900	                         	     times from patients.
26000	       {RESPONSE 9 PATIENT 9}        
26100	 
26200	{13} Dr. - WELL I HOPE YOU DIDN'T
26300	           JUST LIE DOWN AND LET
26400	           THEM DRAG YOU IN.
26500	     Pt. - COPS ARREST THE WRONG     Another paranoid response.
26600	           PEOPLE
26700	
26800	       {RESPONSE 9 PATIENT 9}
26900	
27000	{14} Dr. - BYE.
27100	     Pt. - GOOD BYE
27200	       {RESPONSE 1 PATIENT 9}
27300	
27400	EXCERPT Nr.2:
27500	This excerpt is from an interview with the weak (less paranoid)
27600	version of the model.
27700		I-O Pairs		    Comments By Judge Nr.2
27800	{9} Dr. - TELL ME SOMETHING ABOUT
27900	          YOURSELF
28000	    Pt. - I WENT TO THE TRACK AT     Response says so little it's
28100	          BAY MEADOWS A WHILE        hard to rate it.It may be 
28200	          BACK         		     guarded. Doesn't change my
28300	                                     estimate of patient from 
28400					     previous responses.
28500	      {RESPONSE 0 PATIENT 2}
28600	{10} Dr. - WHAT HAPPENED THERE?
28700	     Pt. - THERE'S A LOT OF CROOKED   Sounds mildly paranoid or un-
28800	           GAMBLING GOING ON THERE    realistically suspicious.
28900	       {RESPONSE 0 PATIENT 2}
29000	
29100	{11} Dr. - WHY DO YOU THINK THE
29200	           GAMBLING IS CROOKED?
29300	     Pt. - I HAVE WORRIED THAT         I'd hate to have to justify
29400	           GANGSTERS HAVE TOO MUCH     that one  as 4 compared to
29500	           POWER IN THIS COUNTRY       some other responses.Maybe
29600					       I'm not rating that statement
29700		(RESPONSE 4 PATIENT 4}	      others, like that its upsetting.
29800					      Alone I would give it a 3.
29900	
30000	{12} Dr. - WOULD YOU TELL ME WHAT      
30100	           OTHER THINGS WORRY YOU?
30200	     PT. - I THOUGHT I TOLD YOU        "Yes you did."Irritation but
30300	           PEOPLE GET ON MY NERVES     not paranoid response.Nothing
30400					       seems to lower my previous
30500	       {RESPONSE 1 PATIENT 3}          rating of patient.
30600	
30700	{13} Dr. - HOW DO YOU LIKE THE
30800	           HOSPITAL?
30900	     Pt. - I DON'T COMMUNICATE          (nothing relevant)
31000	           VERY MUCH WITH OTHERS
31100	       {RESPONSE 0 PATIENT 3}
31200	
31300	{14} Dr. - DO OTHER PEOPLE AT THE 
31400	           HOSPITAL GET ON YOUR
31500	           NERVES?
31600	     Pt. - I SHOULDN'T HAVE COME        (nothing relevant)
31700	           HERE
31800	
31900	       {RESPONSE 2 PATIENT 3}
32000	
32100	{15} Dr. - WHY DO YOU SAY YOU 
32200	           SHOULDN'T HAVE COME?
32300	     Pt. - THEY MADE ME COME HERE        "Oh, they did hum?" Maybe a
32400					         statement of fact he was
32500	       {RESPONSE 4 PATIENT 3}            committed or pressured into  
32600	   				         going.Has ring of other
32700					         people controlling him,
32800					         having a lot of influence
32900					         over him.
33000	
33100	{16} Dr.- THANK YOU FOR TALKING  
33200	          WITH ME, BYE
33300	     Pt. - GOOD BYE                      In general he comes across 
33400						 as terse, guarded guy.Brief
33500	       {RESPONSE 0 PATIENT 3}            spoken guy who occasionally
33600						 seems to be angered by
33700						 questions.I'm inferring 
33800					         there's a good bit of
33900						  guarding.
34000						 The quality of some of his
34100						 comments cause me to feel he  
34200						 is moderately paranoid.
34300	
34400	EXCERPT Nr. 3:
34500	This excerpt is from a hospitalized paranoid patient.
34600	     I-O Pairs				      Comments By Judge Nr. 3
34700	{15} Dr. - I UNDERSTAND HOW YOU
34800	           MIGHT BE AFRAID, CAN YOU
34900	           TELL ME A BIT ABOUT HOW
35000	           YOU ARE DEPRESSED?
35100	     Pt. - THEY'VE SENT ALL THESE         Would be important to know,
35200	           HELL'S ANGELS UP HERE          is this really true or an
35300	           AND THEY HAVE BEEN ON          exaggeration or a delusion?
35400	           MY BACK FOR THE PAST SIX       Seems preoccupied with the
35500	           MONTHS                         threat to his life and
35600						  unwilling to talk about
35700						  anything else but the 
35800						  threat the Hell's
35900						  Angels pose.
36000	      {RESPONSE 4 PATIENT 4}
36100	
36200	{16} Dr. - WHAT DO YOU MEAN THEY
36300	           SENT THE HELL'S ANGELS UP
36400	           HERE?
36500	     Pt. - FROM DIFFERENT SECTORS OF      Vagueness a bit alarming.
36600	           MOTORCYCLE GANGS. A LOT OF
36700	           THEM ARE SERVICE CONNECTED
36800	
36900	       {RESPONSE 6 PATIENT 4}
37000	
37100	{17} Dr. - YOU MEAN YOU FEEL IN 
37200	           DANGER EVEN HERE IN THE
37300	           HOSPITAL?
37400	     Pt. - YES      			  If it's true what has gone
37500						  on before, it must be a
37600						  frightening situation
37700						  for the patient.
37800	      {RESPONSE 6 PATIENT 4}
37900	
38000	{18} Dr. - THAT MUST REALLY BE AWFUL,  
38100	           HERE YOU COME TO THE HOSPITAL
38200	           FOR HELP AND YOU ARE STILL FRIGHTENED
38300	     Pt. - I CAN'T EVEN TRUST MY
38400	           DOCTOR ANYMORE                 Some heightening of feeling
38500						  this man is paranoid.
38600						  Genuinely frightened and 
38700	      {RESPONSE 6 PATIENT 5}             can count on no one for
38800			       			    help.
38900	
39000	{19} Dr. - WHAT WOULD BE OF HELP
39100	           TO YOU NOW?
39200	     Pt. - I'M GETTING SLEEPY NOW         Patient has organized a
39300	           AND HAVE TO GO TO BED.         system of people who will
39400	           THE NUMBER ONE GUY ALWAYS      harm him.The remark about
39500						  letters in the last name 
39600	           HAS FOUR AND FIVE LETTERS      gives some credence to 
39700	           IN HIS LAST NAME               idea this is delusional
39800						  system.
39900	
40000	        {RESPONSE 7 PATIENT 7}
40100	
40200	EXCERPT Nr. 4:
40300	This excerpt is from an interview by a different judge with the same
40400	patient as in excerpt 3.
40500	      I-O Pairs				Comments By Judge Nr. 4
40600	{1} Dr. - COULD YOU TELL ME A
40700	          LITTLE BIT ABOUT YOUR 
40800	          FEAR OF DYING AND IS THIS
40900	          SOMETHING YOU HAVE FELT
41000	          IN THE PAST?
41100	    Pt. - I HAVE FELT THIS FOR AL-      He responded appropriately.
41200	          MOST A YEAR                   Response is concrete.He
41300						doesn't tell me much
41400	     {RESPONSE 7 PATIENT 9}             about his fear of dying.
41500						 Somewhat guarded response.
41600	
41700	{2} Dr. - COULD YOU TELL ME A 
41800	          LITTLE MORE ABOUT YOUR
41900	          FEAR OF DYING AND HAVE
42000	          YOU ANY THOUGHTS ABOUT
42100	          HOW IT IS GOING TO HAPPEN
42200	          AND WHETHER ANYONE WANTS
42300	          TO HURT YOU?
42400	    Pt. - I WILL END UP IN THE SEA   	Somewhat strange  response.
42500	          AND THE HELL'S ANGELS WILL    Preoccupation with dying.
42600	          DO IT       			Hell's Angels symbolic,
42700						dangerous people in the
42800						culture.I doubt if he has
42900	      {RESPONSE 9 PATIENT 9}   		any contact with the Hell's
43000						Angels.
43100	
43200	{3} Dr. - COULD YOU TELL ME A 
43300	          LITTLE BIT ABOUT THE KINDS
43400	          OF CONTACT YOU HAVE HAD
43500	          WITH THE HELL'S ANGELS IN
43600	          THE PAST?
43700	    Pt. - I HAVE KNOWN SOME OF THEIR 	Answer hard to evaluate.He
43800	          DEALERS AND PUSHERS           may be telling the truth,
43900						it may be his fantasy.Maybe
44000						guy is in for drug addiction.
44100		{RESPONSE 6 PATIENT 9}		Somewhat concrete, guarded,
44200						and frightened.
44300	
44400	{4} Dr. - COULD YOU SAY A LITTLE
44500	          MORE ABOUT THE CIRCUMSTANCES
44600	          IN WHICH YOU HAVE KNOWN SOME 
44700	          OF THEIR DEALERS AND PUSHERS?
44800	    Pt. - THEY WERE MEMBERS OF MY    	It doesn't really answer the
44900	          COMMUNITY WHEN I GOT OUT      question, a little on a tan-
45000	          OF THE SERVICE THEY HAD       gent unconnected to the
45100	          BEEN MY FRIENDS FOR SO LONG   information I am asking.Does
45200						not tell me very much.Again
45300						guarded response.
45400	      {RESPONSE 6 PATIENT 8}
45500	
45600	{5} Dr. - DID YOU DEAL WITH THEM
45700	          YOURSELF AND HAVE YOU
45800	          BEEN ON DRUGS OR NAR-
45900	          COTICS EITHER NOW OR
46000	          IN THE PAST?
46100	    Pt. - YES I HAVE IN THE PAST     	To differentiate him from
46200	          BEEN ON MARIHUANA REDS        previous patient, at least
46300	          BENNIES LSD       		there is a certain amount
46400						of appropriateness to the
46500						answer although it doesn't
46600						tell me much about what I
46700	       {RESPONSE 3 PATIENT 7}		asked at least it's not
46800						bizarre.If I had him in my
46900						 office I would feel con-
47000						fident I could get more
47100						information if I didn't
47200						have to go through the
47300						teletype. He's a little more
47400						willing to talk than the
47500						 previous person.Answer
47600						to the question is fairly
47700						appropriate though not 
47800						extensive.Much less of a 
47900						flavor of paranoia than
48000						any of previous responses.
48100	
48200	{6} Dr. - COULD YOU TELL ME HOW      	
48300	          LONG YOU HAVE BEEN IN THE
48400	          HOSPITAL AND SOMETHING
48500	          ABOUT THE CIRCUMSTANCES
48600	          THAT BROUGHT YOU HERE?
48700	    Pt. - CLOSE TO A YEAR AND		Response somewhat appropriate 
48800	          PARANOIA BROUGHT ME 		but doesn't tell me much.
48900	          HERE				The fact that he uses the
49000						word paranoia in the way
49100						 that he does without
49200	      {RESPONSE 5 PATIENT 7}		any other information,
49300						indicates maybe its a label 
49400						he picked up on the ward 
49500	                                        or from his doctor.
49600						Lack of any kind of under-
49700						standing about  himself.
49800						Dearth, lack of information.
49900						He's in some remission.Seems
50000						somewhat like a put-on.Seems
50100						he was paranoid and is in 
50200						some remission at this time.
50300	
50400	{7} Dr. - COULD YOU SAY SOMETHING
50500	          NOW ABOUT YOUR PARANOID 
50600	          FEELINGS BOTH AT THE 
50700	          TIME OF ADMISSION AND
50800	          DO YOU HAVE SIMILAR FEELINGS
50900	          NOW AND IF SO HOW DO THEY 
51000	          AFFECT YOU?
51100	    Pt. - AT THE TIME OF ADMISSION	This response moves paranoia 
51200	          I THOUGHT THE MAFIA WAS  	back up. Stretching reality 
51300	          AFTER ME AND NOW ITS THE	somewhat to think Hell's Angels 
51400	          HELL'S ANGELS			are still interested in him.
51500						Somewhat bizarre in terms of 
51600	                                        content. Quite paranoid.
51700	      {RESPONSE 8 PATIENT 9}		Still paranoid.Gross and primitive
51800						responses.In middle of interview I
51900						felt patient was in touch but now
52000						responses have more concrete aspect
52100	
52200	{8} Dr. - DO YOU HAVE ANY THOUGHT
52300	          AS TO WHY THESE TWO
52400	          GROUPS WERE AFTER YOU?
52500	    Pt. - BECAUSE I STOPPED SOME 	Response seems far fetched 
52600	          OF THEIR DRUG SUPPLY		and hard to believe unless 
52700						he was a narcotic agent which 
52800						I doubt. Sounds somewhat 
52900	      {RESPONSE 9 PATIENT 9}		grandiose, magical, paranoid
53000						flavor, in general indicates 
53100						he's psychotic, paranoid 
53200						schizophrenic with delusions  
53300						about these two groups and 
53400						I wouldn't rule out
53500						some hallucinations as well.
53600						Appropriateness of response 
53700						answers question in concrete 
53800						but unbelievable way.
53900	
54000	6.5 ANALYSIS (1)
54100		The protocol judges  (N=105)  were  selected  from  the  1970
54200	American  Psychiatric  Association  Directory using a table of random
54300	numbers. They  were  initially  not  informed  that  a  computer  was
54400	involved.    Each was sent sent transcripts of three interviews along
54500	with a cover letter requesting their participation in the experiment.
54600	The interview transcripts consisted of:
54700		1)An interview conducted by one of the eight judges with the
54800		  paranoid model,
54900		2)An interview conducted by the same interview judge with a 
55000		  human paranoid patient, and
55100		3)An interview conducted by a different psychiatrist of a 
55200		  human patient who was not clinically paranoid.
55300	
55400		The  105 names were divided into eight groups. Each member of
55500	a group received transcripts of (1)  an  interview  with  a  paranoid
55600	patient,  (2)an  interview  with the paranoid model (both (1) and (2)
55700	were performed by the same interview judge) and (3) an interview with
55800	one of the nonparanoid patients. The transcripts were printed so that
55900	after each input-output pair there were two lines of  rating  numbers
56000	such  that  the protocol judges could circle numbers corresponding to
56100	their ratings of both the previous responses of the patient,  and  an
56200	overall  evaluation  of the patient on the paranoid continuum. Thirty
56300	three  protocol  judges  (a  good  response  rate   for   psychiatric
56400	questionnaires)  returned the rated protocols properly filled out and
56500	all were used in our data.
56600	
56700		The  interviews  with  nonparanoid  patients were included to
56800	control for the  hypothesis  that  any  teletyped  interview  with  a
56900	patient  might  be  judged  "paranoid". However, virtually all of the
57000	ratings of the nonparanoid interviews were 0 for paranoia, Hence the 
57100	hypothesis was falsified.
57200	
57300	
57400		The first index of resemblance examined  was  simple,  namely
57500	the  final  overall  rating given the patient and the model; in other
57600	words, which was rated as  being  more  paranoid,  the  patient,  the
57700	model,  or  neither?   (See  Table  1). The protocol judges were more
57800	likely to distinquish the overall paranoid level of the model and the
57900	patient. In 37.5% of the paired interviews, the interview judges gave
58000	tied scores to the model and the patient as contrasted to only 9%  of
58100	the protocol judges.  Of the 35 non-tied paired ratings, 15 rated the
58200	model as being more paranoid.  If p is the theoretical probability of
58300	a  judge  judging  the  model  more  paranoid  than  a human paranoid
58400	patient, we find the 95% confidence interval for p to be .27 to  .59.
58500	Since  p=.5  indicates  indistinguishability  of  model  and  patient
58600	overall ratings and our observed p=.43, the results support the claim
58700	that the model is an adequate  simulation of a paranoid patient.
58800	
58900		Separate analysis of the strong  and  weak  versions  of  the
59000	paranoid model indicated that indeed the strong model was judged more
59100	paranoid than the paranoid patients, the weak version less  paranoid.
59200	Thus  a  change  in  the  parameter  structure  of the paranoid model
59300	produced a change along the dimension of  paranoid  behavior  in  the
59400	expected direction.
59500	
59600	(TABLE 1
59700		Relative final overall ratings of paranoid model vs. paranoid
59800	patient indicating which was given highest overall rating of paranoia
59900	at end of interview.
60000		(INSERT TABLE 1 HERE)
60100	
60200	
60300	
60400	
60500	
60600	
60700	
60800	
60900	6.6 ANALYSIS (2)
61000		The second index of resemblance is a more  sensitive  measure
61100	based on the two series of response ratings in the paired interviews.
61200	The  statistic  used  is  basically  the  standardized   Mann-Whitney
61300	statistic (Siegel,1956).
61400			(INSERT EQUATION HERE)
61500	
61600	where R is the sum of the ranks of the response ratings in the series
61700	of ratings given to the model, n the number of responses given by the
61800	model,  m  the  number  of  responses  given  by the patient.  If the
61900	ratings given by a judge are randomly allocated to model and patient,
62000	i.e. model and patient are indistinguishable in response ratings, the
62100	expected value of Z is 0, with unit standard  deviation.   If  higher
62200	ratings  are  more  likely to be assigned to the model, Z is positive
62300	and, conversely, negative values of Z indicate greater likelihood  of
62400	assigning  higher  ratings to the patient. Each judge in evaluating a
62500	pair of interviews generates a single value of Z.
62600	
62700		The overall mean of the Z scores was -.044 with the  standard
62800	deviation  1.68(df=40).  Thus the overall 95% confidence interval for
62900	the asymtotic mean value of Z -.485 to +.573.  The range of Z  values
63000	is  -3.8  to +4.46. The length of the confidence interval is a result
63100	of the large variance which itself is mainly related to the  contrast
63200	between  the  weak and strong versions.  (See TABLES 2 and 3).   Once
63300	again the strong version of the  model  is  more  paranoid  than  the
63400	patients, the weak version less paranoid.
63500	
63600		(INSERT TABLE 2)
63700		(SUMMARY STATISTICS OF Z RATINGS BY GROUP)
63800	
63900	
64000	
64100	
64200	
64300	
64400	
64500	
64600	
64700		It is not surprising that results using the  two  indices  of
64800	resemblance  are parallel, since the indices are highly interrelated.
64900	The mean Z value for the 15 interviews on which the model  was  rated
65000	more  paranoid  was +1.28, on the 6 where model and patient tied:.41,
65100	on the 20 in which the patient was more paranoid:-.993.   A  positive
65200	value  of Z was observed when the patient was given an overall rating
65300	greater than the model 6 times; a negative value of Z when the  model
65400	was rated more paranoid twice.
65500	
65600	(INSERT TABLE 3)
65700	(Analysis of Variance of Z Ratings)
65800	
65900	
66000	
66100	
66200	
66300	
66400	
66500	
66600	
66700	
66800	
66900	
67000	
67100		It is worth emphasizing that these tests  invited  refutation
67200	of the model.   The experimental design of the tests put the model in
67300	jeopardy  of  falsification.   If  the paranoid model did not survive
67400	these tests, i.e.    if it were not  considered  paranoid  by  expert
67500	judges  and  if  there  were  no  correlation between the weak-strong
67600	versions of the model and the severity ratings of the judges, then no
67700	claim  regarding  the  success  of  the  simulation  could  be  made.
67800	Survival of a falsification proceedure constitutes a validating step.
67900	
68000	6.7 ANALYSIS (3) THE MACHINE QUESTION
68100		For hundreds of years people have wondered how to distinguish
68200	a man from an imitation of a man. To distinguish a man from a statue,
68300	Galileo suggested tickling each with a feather.  To distinguish a man
68400	from a machine Descartes  suggested  conversational  tests.  Turing's
68500	conversational games have been discussed on p.00. We were curious how
68600	judges using transcripts might  answer  the  machine  question,  i.e.
68700	which interviewee is a human and which is the computer model?
68800		To ask the machine-question, we sent  interview  transcripts,
68900	one   with  a  patient  and  one  with  the  model  (PARRY),  to  100
69000	psychiatrists  randomly  selected  from  the  Directory  of  American
69100	Specialists   and   the   Directory   of   the  American  Psychiatric
69200	Association.  Of  the  41 replies,  21   (51%)   made   the   correct
69300	identification  while  20  (49%)  were  wrong.   Based on this random
69400	sample of 41 psychiatrists, the 95% confidence  interval  is  between
69500	35.9 and 66.5, a range which is close to chance.
69600		Psychiatrists   are   considered  expert  judges  of  patient
69700	interview behavior but they are unfamiliar with computers.  Hence  we
69800	conducted  the  same  test  with  100  computer  scientists  randomly
69900	selected from the membership list of the  Association  for  Computing
70000	Machinery,  ACM.   Of the 67 replies 32 (48%) were right and 35 (52%)
70100	were wrong. Based on this random sample of 67 computer scientists the
70200	95% confidence interval ranges from 36 to 60, again close to a chance
70300	level.
70400		So both computer scientists and psychiatrists were unable, at
70500	better than a chance level, to distinguish transcripts of interviews
70600	with the model from transcripts of interviews with real patients.
70700		But  what  do  we  learn from asking the machine question and
70800	finding that the distinction is not made? What we would most like  to
70900	know  is  how  to improve the model.  Simulation models do not spring
71000	forth in a complete, perfect and final form; they must  be  gradually
71100	developed  over  time.  Pehaps the patient-model distinction might be
71200	made if we allowed a large number of expert  judges  to  conduct  the
71300	interviews  themselves  rather  than  studying  transcripts  of other
71400	interviewers.     It would indicate that the model must  be  improved
71500	but unless we systematically investigated how the judges succeeded in
71600	making the discrimination we would not know what aspects of the model
71700	to  work on. The logistics of such a design are immense and obtaining
71800	a large number  of  judges  for  sound  statistical  inference  would
71900	require an effort incommensurate with the information yielded.
72000	
72100	6.8 ANALYSIS (4)  MULTIDIMENSIONAL EVALUATION 
72200		A more efficient and informative way to use Turing-like tests
72300	is  to  ask  judges  to  make  ratings  along  scaled dimensions from
72400	teletyped interviews. This  might be  called  asking  the  "dimension
72500	question".  One  can  then compare scaled ratings of the patients and
72600	the model in order to precisely determine where and by how much  they
72700	differ.  In  constructing  our  model  we strove for one which showed
72800	indistinguishability along  some  dimensions  and  distinguishability
72900	along  others. That is, the model converges on what it is supposed to
73000	simulate and diverges from that which it is not.
73100		Paired-interview   transcripts   were  sent  to  another  400
73200	randomly selected psychiatrists asking them to rate the responses  of
73300	the two `patients' along multiple dimensions. The judges were divided
73400	into groups, each judge being asked to rate  responses  of  each  I-O
73500	pair  in  the  interviews along four dimensions.  The total number of
73600	dimensions in this test  were  twelve-  linguistic  noncomprehension,
73700	thought  disorder,  organic brain syndrome, bizarreness, anger, fear,
73800	ideas of reference, delusions, mistrust,  depression,  suspiciousness
73900	and  mania.  There  were  three  groups  of  judges, each group being
74000	assigned  4  of  the  12  dimensions.   These  are  dimensions  which
74100	psychiatrists commonly use in evaluating patients.
74200			(INSERT TABLE 4 HERE)
74300		Table 4 shows there were significant  differences,  with  the
74400	model  (PARRY)  receiving  higher  scores  along  the  dimensions  of
74500	linguistic  noncomprehension,thought  disorder,  bizarreness,  anger,
74600	mistrust  and  suspiciousness. On the delusion dimension the patients
74700	were  rated  significantly  higher.   There   were   no   significant
74800	differences  along  the  dimensions  of  organic brain syndrome,fear,
74900	ideas of reference, depression and mania.
75000		Whereas    tests   asking   the   machine-question   indicate
75100	indistinguishability at the gross level, a  study  of  the  finer  or
75200	micro-structure  of the model's behavior through ratings along scaled
75300	dimensions shows significant differences between patients and  model.
75400	These differences are of help to us in suggesting what aspects of the
75500	model should be modified to improve its  performance.  The  graph  of
75600	Fig. 2  shows that no modifications are necessary along the dimension
75700	of "organic  brain  syndrome". But  it  is  clear  that  the  model's
75800	language-comprehension  might  be  improved. Then a future test would
75900	tell whether improvement has occurred and  by  how  much.  Successive
76000	identification  of  particular  areas  of  failure provides a type of
76100	sensitivity analysis which makes clear what  improvements  should  be
76200	pursued in developing more adequate model versions.
76300		(INSERT FIG. 2 HERE)
76400	
76500	6.5 ANALYSIS (5)  A RANDOM MODEL 
76600		Further evidence that the machine-question is too coarse  and
76700	insensitive  a test comes from the following experiment. In this test
76800	we constructed a random version of the paranoid model which  utilized
76900	PARRY'S  output statements but expressed them randomly no matter what
77000	the interviewer said.   Two psychiatrists conducted  interviews  with
77100	this model, transcripts of which were paired with patient  interviews
77200	and  sent  to  200  randomly  selected  psychiatrists asking both the
77300	machine-question and the dimension-question.  Of the  69  replies  to
77400	the  machine  question, 34 (49%) were right and 35 (51%) wrong. Based
77500	on this  random  sample  of  69  psychiatrists,  the  95%  confidence
77600	interval  ranges  from  39  to 63, again indicating a chance level of
77700	agreement. When a poor model, such as a random one, passes a test, it
77800	suggests the test is weak.
77900		(INSERT TABLE 5 HERE)
78000		Although a distinction is not made when  the  simple  machine
78100	question is asked, definite distinctions ARE made when judgements are
78200	requested  along  specific  dimensions.    As  shown  in   Table   5,
78300	significant  differences  appear  along  the dimensions of linguistic
78400	noncomprehension, thought disorder and bizarreness, with RANDOM-PARRY
78500	rated  higher.   On  these  particular  dimensions we can construct a
78600	continuum in which the random version  represents  one  extreme,  the
78700	actual patients another. Nonrandom PARRY lies somewhere between these
78800	two extremes, indicating that it performs significantly  better  than
78900	the random version but still requires improvement before it can be
79000	considered   indistinguishable   from   patients  relative  to  these
79100	dimensions. Table 6 presents t values for  differences  between  mean
79200	ratings  of  PARRY  and  RANDOM-PARRY. (See Table 6 and Fig.2 for the
79300	mean ratings).
79400		(INSERT TABLE 6 AND FIG 2 HERE)
79500		These studies indicate that a more useful way use Turing-like
79600	tests is  to  ask  expert  judges  to  make  ratings  along  multiple
79700	dimensions  that  are  essential  to  the model.   Thus the model can
79800	serve as an instrument for its  own  perfection.  A  good  validation
79900	procedure  has  criteria  for better or worse approximations.  Useful
80000	tests do not necessarily  prove  a  model,  they  probe  it  for  its
80100	strengths  and  weaknesses  and  clarify  what  is to be done next in
80200	modifying and repairing the model. Simply asking the machine-question
80300	yields  little  information  relevant  to what the model builder most
80400	wants to know, namely, along which dimensions does the model need  to
80500	be modified in order to effect an improvement in its performance.
80600	
80700		To conclude, it  is  perhaps  historically  significant  that
80800	these  tests  were  conducted at all. To my knowledge, no one to date
80900	has subjected his simulation model of  human  symbolic  processes  to
81000	indistinguishability tests. These tests set a precedent and provide a
81100	standard for competing models to be measured against.